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1.1.  Executive  Summary 

The  parent  AFOSR  grant  was  focused  on  developing  smart  optoelectronic  interconnection 
networks  that  combine  communication  and  processing  capabilities  in  network  hardware  to 
accelerate  distributed  computing  applications.  Under  that  program  we  have  develops  ne^ork 
designs  that  can  be  efficiently  implemented  using  optical  interconnects.  The  smart  network 
architecture  is  compatible  with  asynchronous  transfer  inode  (ATM)  specifications  Apd  is  a 
switching  technique  based  on  the  emerging  integrated  digital  network  standard  (ISDN),  and  it 
can  be  used  in  telecommunication  networks  and  in  both  wide  and  local  area  computer  networks. 

The  objective  of  this  AASERT  award  was  to  support  a  PhD  thesis  to  develop  optoelectronic 
integrated  circuits  to  interface  with  the  smart  network  (e.g.  a  line  unit  circuit).  The  mnction  ot  a 
line  unit  is  to  provide  an  external  interface,  perform  protocol-related  functions,  and  implement 
ATM  cell  buffering  and  contention  management  functions.  The  approach  we  have  t^en  is  to 
develop  a  design  methodology  for  building  large-scale  photonic  page  buffer  integrated  circuits. 
These  circuits  can  serve  as  an  efficient  interface  to  an  optoelectronic  interconnection  ne^oik. 
We  have  also  uncovered  numerous  other  applications  for  photonic  page  buffer  circuits.  With  the 
methodology  we  have  developed,  one  can  now  design  line  circuits  with  application  speci  ic 
functionality,  performance  and  cost. 
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2.  Photonic  Page  Buffer  Integrated  Circuits 

The  result  of  our  AASERT  program  is^  design  methodologyv^that  can  Be  used  to  efficiently 
implement  large-scale  Photonic  Page  Buffer  (PPB)  ICs  with  application  specific  performance 
and  functionality  requirements.  The  PPB  chip  has  numerous  potential  applications  including 
photonic  switching,  non-volatile  data  storage  (buffering),  interfacing  smart  pixel  systems  with 
electronic  hosts,  implementing  cache  memory  for  optical  memory  devices,  template  matching, 
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spatial  format  conversion,  clock  rate  conversion,  optical  wavelength  conversion,  bandwidth 
smoothing,  clock  synchronization,  and  independent  flow  control  for  input  and  output  channels. 
Although  we  developed  this  methodology  based  on  the  hybrid  CMOS-SEED  platforni  we 
anticipate  that  it  is  compatible  with  other  emerging  smart  pixel  technolopes  that  use  silicon 
CMOS  circuits.  To  verify  the  validity  of  our  approach,  a  sixteen  kilobit,  1008  optical 
channel  32  page,  504bits/page.  50Mpage/sec  photonic  page  buffer  IC  with  random  page  access 
capability  was  designed,  fabricated  and  electrically  tested.  This  21mm  IC  was  fabricated  in  0.8 
micron  HP26G  CMOS  technology  [1]  and  integrates  200,000  transistors.  Our  design  is  b^ed  on 
th6  hybrid  CMOS-SEED  technology  that  integrates  GaAs  MQW  photodetectors  and  modulators 
with  high-volume  commodity  CMOS  VLSI  processes  [2].  It  uses  optical  device  pitch  and 
spacing  of  an  existing  GaAs  MQW  diode  mask  [3].  The  CMOS  optical  receiver  amplifier  and 
transmitter  driver  circuits  use  proven  circuits  as  well.  The  hybrid  CMOS-SEED  technology  was 
selected  based  on  its  availability,  maturity  and  experience. 


Since  our  design  takes  advantage  of  existing  electronic  memory  circuits,  it  is  highly  scalable.  With 
present  0.4  micron  CMOS,  a  16  megabit  PPB  chips  can  be  built  using  SRAM  circuits.  With 
future  0.18  micron  CMOS  technology,  the  capacity  can  be  increased  to  256  megabits  [4].  Using 
DRAM  circuits,  an  additional  4X  increase  in  memory  size  can  be  achieved  with  some  reduction 
in  memory  access  time.  The  number  of  optical  I/O  channels  is  dependent  on  the  optoelectronic 
device  technology  and  the  on-chip  power  consumption  of  receiver/transmitter  circuits.  With 
present  CMOS-SEED  technology,  200-1000  optical  I/O  channels  can  be  achieved  with  power 
consumption  of  3-5  mW  per  channel  operating  at  50-100  Mbps/channel.  This  corresponds  to  a 
total  throughput  of  10-100  gigabits  per  second.  In  comparison,  Rambus,  which  is  a  high- 
performance  electronic  memory,  achieves  a  throughput  2  gigabits  per  second  [5].  The  work 
carried  out  by  this  AASERT  builds  on  our  earlier  effort  that  produced  a  two  kilobit  21,000 
transistor  photonic  page  buffer  IC  [6],  The  64  optical  VO  channels  on  this  IC  were  tested  at 
50Mb/s/channel  optical  data  throughput;  corresponding  to  an  aggregate  optical  data  VO 
bandwidth  of  3.2Gb/s  in  a  Imm^  chip  area. 


3.  Applications  of  Photonic  Page  Buffer  ICs 

A  block  diagram  for  a  PPB  with  M  address  lines,  2“  words,  N  bits  per  memory  word  is  shown  in 
figure  1.  The  inputs  to  the  PPB  include  an  N-bit  data-input  bus  (DIN),  an  M-bit  address  bus  (A), 
write  enable  (WR)  control  signal,  and  other  optional  application-specific  control  signals  (see 
section  4).  An  N-bit  data-output  bus  (DOUT)  forms  the  output  of  the  PPB.  The  write  data  enters 
PPB  optically  through  the  DIN  port  and  is  optically  read  out  at  the  DOUT  port.  The  A  bus  is  the 
input  port  to  address  the  2”^  words  in  the  memory.  Many  applications  exist  for  PPB  chips  with 
random  page  access  capability.  Although  distinct  applications  may  require  a  slightly  different 
PPB  design,  the  design  methodology  presented  here  can  efficiently  realize  such  application- 
specific  requirements.  The  paragraphs  that  follow  describe  several  PPB  applications.  Due  to  the 
complexity  of  each  application,  no  detailed  description  will  be  attempted  here. 
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Figure  1:  Block  diagram  of  a  photonic  page 
buffer  IC  with  random  page  access  capability. 


Electronic  interface  for  Photonic  systems:  Typically,  smart  pixel  systems  are  electrically 
connected  to  an  electronic  host  computer.  The  PPB  provides  an  efficient  mechanisin  to 
accomplish  this  task.  In  this  application,  data  enters  the  PPB  optically  through  the  N-bit  Dm 
port.  There,  it  is  buffered  and  converted  into  a  K-bit  data  stream  that  is  electrically  brought 
outside  the  chip.  In  the  reverse  direction,  an  electrical  K-bit  data  stream  enters  the  chip.  There,  it 
is  buffered  and  assembled  into  an  N-bit  vector  that  is  optically  written  to  the  DOUT  port.  For 
example,  a  PPB  design  may  have  1000  optical  channels  running  at  lOOMbps/channel  and  64 
electrical  channels  running  at  SOMbps/channel. 

Scratchpad  memory  for  Smart  Pixel  Systems;  Smart  pixel  designs  often  require  high-speed 
memory  and  the  PPB  can  efficiently  serve  this  purpose.  For  example,  one  smart  pixel  design  [7] 
uses  eleven  PPB  chips  as  part  of  a  twenty-chip  high-performance  FFT  processor.  This  processor 
can  compute  a  new  1,024-point  complex  FFT  in  every  0.44  psec.  In  retrospective,  a  high- 
performance  electronic  system  that  uses  4  Sharp  LH9124  FFT  processor  chips,  12  Sharp 
LH9320  address  generator  chips,  12  SRAM  chips,  and  various  glue-logic  chips  requires  31  psec 

for  the  same  computation  [8], 

Cache  memory  for  3-D  Optical  Memory  Devices:  The  use  of  smart-pixel  ICs  as  cache  memory 
for  3-D  optical  memory  devices  was  recently  proposed  [9],  In  this  application,  a  smart  pixel  IC 
provides  an  efficient  interface  to  a  low-latency,  wide-word  optical  memory.  The  PPB  design 
methodology  presented  here,  can  be  applied  to  build  smart  pixel  ICs  described  in  reference  9. 

Photonic  Switching:  As  shown  in  figure  2,  the  PPB  can  implement  a  memory-based  time- 
division  switch  [10].  In  this  scheme,  different  channels  are  put  on  an  N-bit  optical  TDM  (time 
division  multiplexing)  bus,  with  successive  slots  containing  information  on  different  channels. 
The  PPB  acts  as  a  central  queue  for  incoming  packets.  An  off-chip  controller  circuits  serially 
routes  the  incoming  packets.  For  example,  a  single  PPB  chip  with  1000  optical  channels  mnning 
at  lOOMbps/channel  can  implement  a  16  port  switch  running  at  6.25Gbps/port.  The  aggregate 
throughput  of  this  single-chip  switch  design  will  be  100  Gigabits/sec. 

Photonic  FIFO:  First-in  first-out  (FIFO)  memories  help  interface  switching  and  computing 
systems  by  providing  non-volatile  storage  (buffering),  clock  synchronization,  asynchronous-to- 
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synchronous  conversion,  bandwidth  smoothing,  improved  jitter/skew  tolerance,  and  independent 
flow  control  for  input  and  output  channels.  By  adding  a  small  control  circuit,  the  PPB  can  be 
made  to  emulate  a  photonic  FIFO  memory  [1 1]. 


Optical  Data  Format  Conversion:  The  PPB  can  be  used  to  convert  between  different  optica 
formats.  For  example,  a  PPB  chip  with  1000  optical  input  channels  can  read  1000-bit  optical 
planes  at  IKHz  clock  rate,  convert  this  data  to  bit-serial  format,  and  write  it  to  a  single  optical 
output  channel  at  IGhz  clock  rate. 

Template  Matching:  In  this  application,  the  PPB  memory  is  first  loaded  with  templates.  Then, 
optical  bit-plane  data  entering  the  PPB  is  compared  against  these  templates  to  find  the  best 
match.  Finally,  the  results  are  reported  to  the  host  controller.  This  application  can  be  performed 
at  high-speed  by  adding  logic  circuitry  to  the  basic  PPB  design  as  described  in  section  4. 


2,448  Mbps 


Figure  2;  Single-chip  TDM  switch  using  photonic  page  buffer  IC.  For  4  ports 
at  622Mbps,  the  aggregate  bandwidth  of  PPB  device  must  support  2.5Gbps. 


4.  Smart  Pixel  IC  Layout 

Smart  pixel  IC  integrate  a  rectangular  array  of  optical  transmitter  and  receiver  devices  on  top  of 
an  electronic  integrated  circuit.  This  integration  can  be  monolithic  or  hybrid.  For  example,  the 
hybrid  CMOS-SEED  platform  uses  flip-chip  bonding.  The  placement  of  optical  devices  directly 
on  top  of  integrated  circuits  is  necessary  to  minimize  the  interconnect  capacitance,  and  hence 
power  consumption,  for  interconnects  between  the  IC  and  the  optoelectronic  device  amay.  Using 
a  rectangular  array  of  optoelectronic  devices  with  a  fixed  pitch  simplifies  system  design,  and,  in 
the  case  of  flip-chip  bonding,  leads  to  higher  flip-chip  yield  as  compared  with  random  placement 
approaches.  It  should  be  noted,  random  placement  of  optical  transmitter  and  receiver  devices  can 
be  effectively  accomplished  by  not  using  all  of  the  devices  in  the  rectangular  array.  The 
paragraphs  that  follow  review  chip  layout  methods  used  in  smart  pixels  and  introduce  the  chip 
layout  methodology  selected  for  our  photonic  page  buffer  design. 
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devices  bonded  directly  over  active  silicon  circuits.  The  rectangular 
array  of  pads  in  the  center  of  the  chip  is  for  flip-chip  bonding  of 


MQW  diodes. 


One  emerging  method  for  efficiently  combining  high-performance  electronic  circuit  layouts  with 
optoelectronic  devices  is  to  integrate  optical  devices  directly  over  active  silicon  VLSI  circuits 
(see  figure  3)  For  example,  a  two  kilobit  21.000  transistor  photonic  page  buffer  IC  organized  as 
a  32  words  deep  and  32  bits  per  word  First-In  First-Out  (FIFO)  memory  was  recently 
demonstrated  [6].  The  64  optical  I/O  channels  on  this  IC  were  individually  tested  at 
50Mb/s/channel  data  throughput;  corresponding  to  an  aggregate  optical  data  I/O  bandwidth  of 
3.2Gb/s  in  a  1mm"  chip  area.  Although  this  approach  works  well  for  moderately  sized  designs,  it 
has  several  limitations  when  applied  to  large  chips.  First,  the  length  of  the  electrical  wire 
connecting  optical  devices  to  the  transimpedance  amplifier  (for  optical  receiver  device)  or  to  the 
driver  (for  optical  transmitter  device)  increases  with  the  size  of  the  chip.  This  leads  to  higher 
interconnect  capacitance,  higher  power  consumption  and  increased  latency.  For  example,  the 
capacitance  of  a  typical  hybrid  CMOS-SEED  modulator  is  under  lOOff  while  a  0.5  mm 
electrical  wire  has  a  capacitance  in  excess  of  lOOOfF.  The  second  limitation  is  specific  to  our 
photonic  page  buffer  design.  As  will  be  discussed  shortly,  our  design  uses  SRAM  circuits  to 
achieve  high  storage  density.  Internally,  SRAMs  use  weak  analog  signals  that  are  amplified  by 
sense  amplifiers  located  at  the  periphery  of  the  device.  To  ensure  signal  integrity  on  these  lines, 
routing  of  high-speed  digital  signals  directly  over  SRAMs  is  typically  not  permitted.  Thus  it 
becomes  difficult  to  integrate  optical  devices  directly  over  large  SRAM  circuits  and  at  the  same 
time  achieve  high-speed  operation. 
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Figure  4.  Example  of  “smart  pixel”  layout  style  where  a  single  circuit  is 
replicated  in  a  two-dimensional  array.  The  layout  shown,  implements 
4x4  array  of  smart  pixels. 


Another  popular  smart  pixel  layout  methodology  is  to  design  a  self-contained  smart  pixel 
circuit  with  electronic  processing  circuitry,  optical  transmitter/receiver  circuitry,  and  optical  I/O 
devices.  This  circuit  is  then  replicated  in  a  two-dimensional  “smart  pixel  array  structure  as 
shown  in  figure  4.  While  this  approach  is  highly  effective  and  popular,  its  drawback  for  building 
large-scale  photonic  page  buffers  is  low  storage  density.  In  this  approach,  register  files  are 
typically  used  for  memory  function.  An  edge-triggered  register  uses  24  transistors  to  store  a 
single  bit,  as  compared  with  six  transistors  in  the  SRAM  circuit  [12].  Addition  of  random  page 
access  capability  would  require  additional  control  circuitry,  further  increasing  the  tr^istor 
count  per  memory  bit.  On  the  other  hand,  the  use  of  high-density  SRAM  circuits  with  this 
approach  is  difficult  because  existing  RAM  layouts  are  not  easily  partionable  onto  a  “smart 
pixel”  array  structure. 


Figure  5:  The  proposed  layout  approach  integrates  receiver  amplifier  circuits, 
transmitter  driver  circuits,  and  optical  devices  within  a  photonic  interface 
module  (PIM),  placed  in  the  center  of  the  chip.  Existing  VLSI  layouts 
can  then  be  placed  around  the  PIM. 


Our  scheme  is  a  compromise  between  these  two  approaches.  It  places  the  array  of  optical  devices 
directly  over  their  coiresponding  receiver  amplifier  and  transmitter  driver  circuits.  The  resulting  ^o- 
dimensional  array  structure,  called  the  photonic  interface  module,  is  located  in  the  center  of  the 
integrated  circuit.  The  memory  circuits  are  then  placed  around  the  photonic  interface  module  as 
shown  in  figure  5.  With  this  approach,  the  electronic  and  the  photonic  portions  of  the  IC  ^ 
be  optimized  separately.  For  example,  the  memory  structure  can  use  existing  high-density  SRAM 
circuit  layouts.  On  the  other  hand,  the  photonic  interface  array  can  be  designed  to  locate  the 
driver/receiver  circuitry  near  the  optical  devices  with  which  they  communicate  to  achieve  optmim 
performance.  While  we  have  described  this  layout  scheme  in  the  context  of  photonic  page  buffer  ICs, 
it  can  be  applied  to  other  large-scale  smart  pixel  IC  designs  [7].  Finally,  it  should  be  pointed  out,  tiiat 
this  layout  scheme  is  specifically  targeted  for  large-scale  smart  pixel  ICs  that  permit  optical 
device  arrays  with  tight  pitch,  such  as  the  hybrid  CMOS-SEED  platform. 

5,  Architecture  of  Photonic  Page-Buffer  iC 

5.1.  Memory  Architecture  Selection 

A  number  of  efficient  circuit  designs  exist  for  memory  structures  including  dynamic  RAM 
(DRAM),  static  RAM  (SRAM),  First-In  First-Out  (FIFO),  stack  ,  Read-Only  Memory  (ROM), 
and  register  file.  For  our  photonic  page  buffer  design,  we  have  selected  the  SRAM  circuit  based 
on  its  high-speed,  low-power  consumption,  and  high  storage  density  ch^acteristics.  Efficient 
circuit  designs  for  SRAMs  are  readily  available  [12]  and  they  can  be  fabricated  by  commercial 
silicon  foundries.  With  small  additional  controller  circuitry,  SRAMs  can  efficiently  emulate 
FIFO  and  stack  functionality  [11].  The  paragraphs  that  follow  describe  the  considerations  that 
led  to  the  selection  of  the  SRAM  over  other  memory  structures  for  our  PPB  design. 

A  DRAM  uses  a  single  transistor  circuit  for  storing  one  memory  bit;  thus,  it  achieves  the  highest 
storage  density  as  compared  with  other  memory  circuits  [12].  On  the  other  hand,  efficient 
implementation  of  DRAM  circuitry  requires  special  processing  steps  that  are  typically  not 
available  from  commercial  silicon  foundries.  Also,  DRAMs  are  typically  slower  than  SRAMs 
due  to  the  increased  sensitivity  requirements  placed  on  its  sense  amplifiers.  For  these  reasons  we 
have  chosen  not  to  use  the  DRAM  circuit  in  our  photonic  page  buffer  design. 

Both  the  FIFO  and  stack  circuits  are  high-speed  circuit  designs  that  are  well  suited  for  small- 
scale  designs  [12].  Scaling  these  circuits  to  large  capacity  is  difficult  due  to  (1)  high  power- 
consumption,  since  all  transistors  in  the  circuit  can  change  state  during  a  memory  access  and  (2) 
low  storage  density,  due  to  large  number  of  transistors  required  for  storing  one  memoiy  bit.  For 
these  reasons,  we  have  chosen  not  to  use  these  circuits  for  our  photonic  page  buffer  design. 


Finally,  although  register  file  is  the  fastest  memory  circuit,  it  was  not  selected  for  the  photonic 
page  buffer  design  because  of  low  density  and  high  power  consumption.  On  the  other  hand,  the 
ROM  was  not  chosen  because  of  its  limited  application  since  ROM  contents  cannot  be  changed 
after  the  chip  is  fabricated. 


5.2.  Architecture  of  High-Speed  Static  RAM 

A  block  diagram  for  asynchronous  static  RAM  circuit  with  M  address  lines,  2^  words,  N  bits  per 
memory  word  was  shown  in  figure  1.  The  inputs  to  the  RAM  include  an  N-bit  da.ta-input  bus 
(DIN)  an  M-bit  address  bus  (A),  write  enable  (WR)  control  signal,  and  the  optional  ou^ut 
enable  (OE)  control  signal.  An  N-bit  data-output  bus  (DOUT)  forms  the  output  of  the  R^.  The 
write  data  enters  RAM  through  the  DIN  port  and  is  read  but  at  the  DOUT  port.  The  A  bus  is  the 
input  port  to  address  the  2“  words  in  the  memory. 


Writing  to  the  RAM  is  performed  in  three  steps.  First  the  address  bus  (A)  is  set  to  the  memo^ 
address  that  is  being  written  to.  At  the  same  time,  the  data-input  (DIN)  bus  is  set  to  tbe  data  to  be 
written  Second,  the  write  signal  (WR),  which  normally  stays  high,  is  pulled  low  and  hdd  there 
for  a  specified  time  period  (t,p,).  Third,  when  the  WR  signal  goes  high,  the  data  on  the  DIN  bus 
is  written  to  memory.  As  typical  for  edge-triggered  storage  circuits,  the  DIN  bus  must  remain 
unchanged  for  a  specified  amount  of  time  before  (i.e.  setup  time)  and  after  (i.e.  hold  time)  the 
rising  edge  of  the  WR  signal. 


Reading  from  the  RAM  is  accomplished  in  two  steps.  First,  the  address  bus  (A)  is  set  to  the 
memory  address  that  is  being  read  from.  Second,  after  a  specified  amount  of  time  (t,cc).  the 
contents  of  that  memory  location  appear  on  the  data-output  (DOUT)  bus. 
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Figure  6:  General  organization  for  the  static  RAM  circuit. 

Since  the  RAM  does  not  allow  simultaneous  read  and  write  access,  the  DIN  and  DOUT  buses 
can  use  the  same  physical  wiring,  leading  to  reduced  on-chip  wiring  density.  In  this  case,  the 
DOUT  bus  must  use  tri-state  drivers  that  are  set  to  high-impedance  state  when  the  RAM  is  being 
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written  to.  Additional  input  control  signal,  called  output  enable  (OE),  is  added  to  the  RAM 
design  to  control  these  tri-state  drivers. 

Because  of  the  size  and  complexity  of  RAM  circuitry,  no  detailed  dUcus^on  will  b®  attempted 
here  [12].  Instead,  we  will  describe  the  general  layout  of  the  RAM;  how  the  cornponems  of  the 
ram  are  implemented  for  variable  word  size,  memory  size,  bits-per-column  and  tn-state  output, 
and  how  these  components  are  brought  together  into  a  physical  layout.  Our  dis^ssion 
emphasizes  the  aspects  of  the  RAM  layout  that  are  relevant  for  photonic  page  buffer  IC 

implementation. 

Figure  6  shows  the  general  layout  of  a  high  speed  static  RAM.  The  he^  of  the  RAM  is  a 
memory  array.  To  the  left  of  the  array  is  the  row  decode  circuitry,  the  address  buffers,  and  the 
control  signal  buffers.  Below  the  array  is  the  column  decode,  read/write  c‘rcmtry,  and 
input/output  circuitry.  The  paragraphs  that  follow  describe  these  components  of  the  RAM. 
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Figure  7;  Memory  array  organization.  This  example 
shows  a  memory  with  32  four-bit  words  and  a  bpc  of  4. 


The  memory  array  is  a  grid  of  memory  cells,  each  storing  1  bit  using  a  6  transistor  circuit,  s 
figure  7  shows,  the  memory  cell  consists  of  two  cross-coupled  inverters  and  two  pass  transistors. 
The  memory  cells  are  placed  in  a  two-dimensional  array  reflecting  user  specified  word  size, 
number  of  words  and  bits  per  column.  Word  size  (hpw)  determines  the  number  of  multi-bit 
columns.  Number  of  words  {words)  and  bits-per-column  {bpc)  determines  the  number  of  rows. 
For  a  given  RAM  capacity  and  word  size,  the  bits-per-column  parameter  allows  the  designer  to 
control  the  aspect  ratio  (ratio  of  height  to  width)  of  the  RAM  layout.  This  parameter  is  used  in 
our  methodology  to  adjust  the  RAM  aspect  ratio  to  match  the  size  of  the  photonic  interface 
array.  Figure  8  shows  a  memory  array  containing  32  four-bit  words  with  a  bpc  of  four.  The  array 
is  filled  from  left  to  right,  and  from  bottom  to  top.  Thus  the  first  word  occupies  bit  0  of  each 
column  of  row  0.  the  second  word  occupies  bit  1  of  each  column  of  row  0,  and  so  on,  with  eyeiy 
fifth  word  moving  up  to  the  next  row.  Addressing  a  word  consists  of  selecting  the  row  and  the 
bit  column  occupied  by  the  word.  This  is  simplified  by  requiring  that  the  always  be  a 
multiple  of  2.  This  way,  the  N  lower  bits  of  the  address,  where  N-log2(bpc),  will  always 
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correspond  exactly  to  the  column  bit  number  and  the  remaining  bits  of  the  address  will  always 
correspond  exactly  to  the  row  number. 


Several  control  signal  in  the  RAM,  such  as  the  output  enable  (OE)  signal  have  large  fanout  To 
ensure  high-speed  operation,  these  signals  are  driven  using  a  cascaded  chain  of  optimally  sized 
inverters  [12],  This  takes  place  in  the  control  buffer.  On  the  other  hand,  the  internal  column 
data-output  lines  (DL  and  DLB  in  figure  11)  carry  weak  analog  signals  that  are  amplified  to 
digital  levels  using  sense  amplifier  circuits.  To  maintain  signal  integrity  on  these  lines,  routing 
of  high-speed  digital  signals  directly  over  the  RAM  circuit  is  typically  not  permitted.  Our  design 
methodology  for  photonic  page  buffer  ICs  does  not  violate  this  routing  constraint.  This 
completes  the  discussion  of  the  static  RAM  circuit. 


5.3.  Architecture  of  Photonic  internee  module 

As  discussed  previously,  the  photonic  interface  module  integrates  optical  devices  directly  over 
the  corresponding  transmitter-driver/receiver-amplifier  circuitry  (see  figure  5).  Its  purpose  is  to 
convert  between  off-chip  optical  bit-plane  data  and  on-chip  digital  electrical  signal  formats.  The 
resulting  electrical  signals  are  routed  to  the  periphery  of  the  photonic  interface  module  for 
connection  with  memory  circuits.  The  interconnect  density  and  power  consumption  associated 
with  this  wire  routing  are  important  issues  when  large  number  of  opti<^l 
For  example,  consider  a  2mm  x  2mm  photonic  interface  module  for  hybrid  CMOS-SEED 
technology.  In  0.8  micron  HP26G  CMOS  process,  with  2.6  micron  interconnect  pitch,  3,077 
signals  can  be  brought  to  the  periphery  of  the  module.  In  practice,  the  actual  number  of  I/O 
signals  that  can  be  supported  by  this  interface  module  will  be  lower  since  the  transmitter-dnver 
and  the  receiver-amplifier  circuits  with  properly  sized  power  rails  occupy  2500  pm  of  “i® 

(see  section  5).  This  reduces  the  number  of  I/O  signals  to  1,600  (4mm  /2500  pm  ).  Using  a 
conservative  estimate,  that  average  routing  wire  length  is  4  mm.  the  power  consumption  per 
routing  wire  at  lOOMHZ  clock  rate  becomes: 


Pdiss  •F  =  =  \mWatt 

Increase  in  the  number  of  I/O  signals  can  be  achieved  using  larger  photonic  interface  modules  or 
smaller  feature  size  CMOS  technology.  For  example,  in  0.5  micron  HP  14TB  CMOS  process 
[13]  with  1.4  micron  interconnect  pitch,  a  4mm  x  4mrn  photonic  interface  module  can  suppo 
in  excess  of  10,000  optical  I/O  signals.  In  addition,  the  power  consumption  will  be  reduced  since 
HP14TB  operates  with  a  3v  power  supply. 


5.4.  Architecture  of  Photonic  page  buffer  1C 

The  chip  floorplan  for  a  generic  photonic  page  buffer  IC  was  shown  in  figure  5.  The 
interface  module,  located  at  the  center  of  the  chip,  provides  2N  optical  channels, 
inputs  and  N  outputs.  An  N-bit  wide  and  2“-bit  deep  RAM  is  divided  into  four  N/4-bit  RAM 
bilks  that  surround  the  photonic  interface  module.  The  RAM  bpc  design  parameter  is  adjusted 
to  achieve  high  packing  density.  Between  the  RAM  bank  and  the  photonic  interface  module 
there  is  an  optional  set  of  N/4  vectored  cells,  called  a  datapath,  that  contain  application-specific 
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logic  circuitry.  Typically,  the  datapath  is  “folded”  into  multiple  columns  to  match  the  height  of 
the  RAM  bank.  The  datapath  concept  allows  one  to  add  application-specific  functmnahty  to  the 
photonic  page  buffer  design,  including  boundary  scan  testing,  electrical  read/write  RAM 
interface,  parity  generation  and  checking,  CRC  error  checking  and  correction,  and  data  coding. 

The  M-bit  address  bus,  the  RAM  write  enable  control  signal,  the  optional  RAM  output  enable 
control  signal,  and  the  datapath  control  signals  are  routed  to  the  periphery  of  the  chip  for 
electrical  ^connection.  The  write  data  enters  the  photonic  page  buffer  optically  through  an  N-brt 
DIN  port  and  is  optically  read  out  at  the  N-bit  DOUT  port.  In  this  design,  the  comrol  signals  and 
the  M  address  bits  are  controlled  electrically.  A  number  of  variations  are  posable  for  the  basic 
design  discussed  here.  For  example,  RAM  addressing  and  control  can  be  performed  optically. 
Alternatively,  with  VCSEL-based  technologies  it  may  be  appropriate  to  use  fewer  optical 
channels  operating  at  higher  speed.  In  this  case,  the  number  of  electrical  channels  on  the  chip 
will  exceed  the  number  of  optical  I/O  channels,  and  additional  multiplexing  circuits  will  be 
required  to  convert  between  these  two  formats. 

5.5.  Scalability  and  performance  analysis 

Since  our  design  takes  advantage  of  existing  electronic  RAM  circuits,  its  storage  capacity  is  highly 
scalable.  For  example,  16-Mbit  static  RAM  devices  with  access  times  of  10  to  15  nanoseconds 
have  been  demonstrated  in  0.4  micron  CMOS  technology  [14,15.16].With  future  0^18  micron 
CMOS  technology,  the  total  capacity  can  be  increased  to  256  megabits  [4].  Although  elecrtonic 
static  RAM  ICs  use  a  small  number  of  external  I/O  channels,  internally,  foe  RAM  is  » 
parallel  device  capable  of  supporting  large  word  sizes  as  discussed  m  section  4.2.  This 
characteristic  makes  these  devices  suitable  for  large-scale  photonic  page  buffers  with  high- 
density  and  large  word  size.  Using  DRAM  circuits  instead  of  SRAMs,  an  additional  4X  mcre^e 
in  memory  size  can  be  achieved  with  some  reduction  in  memory  access  time.  For  example,  64- 
Mbit  dynamic  RAM  devices  with  access  times  of  30  to  50  nanoseconds  have  been 
in  0.4  micron  CMOS  technology  [17,18].  With  0.25  micron  CMOS  technology,  256-lVfoit 
dynamic  RAM  devices  become  possible  [19],  With  future  0.18  micron  CMC^  technology  t  e 
total  DRAM  capacity  is  projected  at  1,000  megabits  [4].  The  drawback  of  DRAM  circuits  is 
their  requirement  for  specialized  CMOS  processing  which  may  be  incompatible  with  logic  and 
analog  amplifier  circuits  used  in  our  PPB  design  methodology.  Finally,  the  power  consumption 
for  CMOS  RAM  circuits  is  extremely. 

The  number  of  optical  I/O  channels  in  the  PPB  is  dependent  on  the  optoelectronic  device 
technology  and  the  on-chip  power  consumption  of  receiver/transmitter  circuits.  For  ®xampl^ 
with  current  CMOS-SEED  technology,  200-1000  optical  I/O  channels  can  be  achieved  with 
Dower  consumption  of  3-5  mW  per  optical  channel  operating  at  50-100  Mbps/channel  [6,7].  This 
corresponds  to  a  total  throughput  of  10-100  gigabits  per  second  and  on-chip  power  consumption 
of  1  to  5  Watts.  In  comparison,  Rambus,  which  is  a  high-performance  electronic  memoty, 
achieves  a  throughput  2  gigabits  per  second  [5].  Ultimately,  optical  pin-count  may  be  limited  by 
other  considerations  such  as  optical  channel  bit-error  rate  and  overall  system  design 

considerations. 
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6.  A 1 6-Kb  Photonic  Page-Buffer  1C 

To  verily  the  validity  of  our  methodology,  a  sixteen  kilobit,  1008  optical  I/O  channel,  32  page, 
504bits/page,  50Mpage/sec  PPB  IC  with  random  page  access  capability  was  designed,  fabncated 
and  electrically  tested.  This  21mm^  IC  was  fabricated  in  0.8  micron  HP26G  CMOS  technology 
and  integrates  200,000  transistors.  Our  design  is  based  on  the  hybrid  ^MOS-SEED  technology^It 
uses  optical  device  pitch  and  spacing  of  an  existing  GaAs  MQW  diode  mask  [3].  The  CMOS 
single-beam  optical  receiver  amplifier  and  single-beam  transmitter  driver  circuits  use  proven 
circuits  as  well.  In  particular,  our  receiver  uses  a  design  with  a  demonstrated  maj^um  bit-rate  ot 
375Mb/s  with  a  power  consumption  of  3.5  mW  and  minimum  switching  energy  of  oOu  [20J. 


Figure  8:  PhotomicTograph  of  the  16-Kb  photonic  page 
buffer  IC  fabricated  in  HP26G  CMOS  for  hybrid  CMOS-SEED. 

Figure  8  shows  the  layout  of  the  16-Kb  PPB  chip.  The  photonic  interface  module  (PIM)  is  2mm  x 
1mm  in  size  and  supports  1,008  optical  I/O  channels.  The  504  input  channels  are  arranged  m  a  18 
X  28  diode  array  located  in  the  upper  1  mm  x  1mm  portion  of  the  PIM  that  uses  35  micron 
vertical  and  65  micron  horizontal  pitch.  The  504  output  channels  use  a  similar  layout  and  are 
placed  in  the  lower  1mm  x  Imra  portion  of  the  PIM.  Optical  receiver  and  transmitter  circuit  are 
placed  directly  under  the  MQW  diode  that  they  service.  Because  of  tight  MQW  diode  pitch,  only 
2275  itm"  is  available  for  these  circuits.  The  1,008  electrical  signals  are  routed  to  the  east  and 
west  sides  of  the  PIM.  The  use  of  two  PIM  sides,  rather  than  four,  for  routing  electrical  signals 
was  selected  to  reduce  chip  fabrication  costs  and  to  simplily  power  distribution  for  the  PIM.  The 
PPB  chip  uses  two  memory  banks,  organized  as  32-bit  deep  252-bit  wide  RAMs  with  a  bpc  value 
of  1  that  are  placed  around  the  PIM.  A  datapath  module,  having  252  vectored  logic  cells,  is 
placed  between  each  memory  bank  and  the  PIM.  The  datapath  module  is  folded  mto  three 
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columns,  each  column  having  84  logic  cells,  to  match  the  height  of  the  memory  banlt^  Became  the 
PIM  receiver  circuits  use  a  transimpedance  amplifier  design  with  static  power  diMipation  of 
3.5mW,  the  power  rail  routing  and  sizing  for  a  PIM  with  504  receivers  hm  to  be  careMy 
considered.  Our  design  uses  wide  power  rails  (lOOX  minimum  metal  width)  and  multipe 
redundant  power  pins  to  ensure  that  the  PIM  is  adequately  powered. 


Figure  9  shows  the  functional  diagram  for  the  16Kbit  PPB  design.  A  504-bit  tri-state  bus  connects 
all  the  components  on  the  chip.  These  components  include  the  RAM,  the  datapath,  the  optical 
output  array,  and  the  optical  input  array.  The  control  signals  and  the  5  address  bits  are  controU^ 
electrically.  As  discussed  earUer,  the  logic  circuits  in  the  datapath  can  add  application-specmc 
fonctionaUty  to  the  basis  PPB  design.  In  our  case,  each  datapath  cell  contains  tri-state  drivers  for 
driving  the  bus,  a  2-to-l  multiplexer  circuit,  and  an  edge-triggered  flip-flop  with  parallel  load  and 
serial  shift  capabiHty.  The  shift-in  and  shift-out  lines  of  flip-flops  in  adjacent  datapath  cells  are 
connected  forming  a  504-bit  shift-register.  The  shift-in  and  shift-out  lines  for  this  register  are 
electrically  brought  outside  of  the  chip.  This  datapath  scheme  permits  electrical  boundary  sc^ 
test  capabiHty  and  permits  serial  electrical  access  to  the  PIM  and  the 
paragraphs  that  follow  describe  the  various  modes  of  operation  possible  with  the  16-Kb  FFB 

design. 

r,.a4/wrim  mnmorv  access:  Here  the  PPB  chip  is  used  as  a  16-Kb  RAM  with  a  bit-serial 
electrical  interface.  Writing  data  requires  multiple  clock  cycles.  Fust,  504  clock  cycles  are  us^ 
to  serially  load  the  504-bit  register.  Next  the  contents  of  the  register  are  written  in  parallel  to  the 
desired  RAM  address  as  discussed  in  section  4.2.  Reading  data  uses  a  similar  approach. 
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r^.«d/write  access:  Here  the  PPB  chip  is  used  as  a  photonic  RAM  with  random  access  to 
32  504-bit  pages.  The  write  data  enters  the  photonic  page  buffer  opticaUy  through  an  504-bit 
DIN  port  and  is  optically  read  out  at  the  504-bit  DOUT  port. 

F.ipcrrn-npttrfll  read/write  access:  This  mode  combines  of  the  two  operations  described  above  to 
translate  between  electrical  and  optical  data  formats. 

Onfiraltpstmode:  This  mode  was  developed  to  allow  simple  parallel  testing  of  the  optical  device 
array.  Here,  the  optical  input  channels  are  directly  connected  to  their  corresponding  output 
channels.  Thus  an  incoming  504-bit  optical  plane  of  I’s  produces  an  output  504-bit  optical  plane 
of  I’s,  permitting  one  to  quickly  locate  defective  optical  devices. 

Spatial.FoimaLCmivsrter:  Here  the  16-Kb  PPB  is  optically  loaded  with  504-bit  optical  data 
planes.  The  shift-register  is  used  to  multiplex  this  data  onto  K  optical  output  channel  where 
K<504  Alternatively,  the  reverse  of  this  operation  is  also  possible.  For  example,  the  16-Kb  can 
to  input  a  bit-serial  optical  stream  at  300Mbps,  convert  it  into  parallel  format,  and  output  a  504- 
bit  optical  data  plane  at  0.6Mbps. 

Electrical  testing  of  the  16-Kb  PPB  chip  was  performed  using  a  custom  printed  circuit  board  with 
a  XILINX  FPGA  acting  as  the  RAM  controller  and  using  a  64-channel  50MHz  digital  logic 
analyzer.  Figure  10  shows  the  a  100Mbps  electrical  bit  pattern  bemg  written  into  the  PPB  chip. 
Electrical  testing  has  demonstrated  proper  function  of  the  electrical  circuits  at  lOOMHz  c  oc 
rates.  Static  chip  power  consumption  was  measured  at  2.5  Watts  as  expected.  Although  optic 
testing  of  the  16-Kbit  PPB  was  not  performed,  we  anticipate  that  the  optical  devices  and 
associated  driver  circuits  wiU  be  able  to  operate  at  50MHz  since  they  use  proven  designs  that 
have  shown  300Mb/s  optical  operation  in  previous  designs. 
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Figure  20;  High-speed  test  results  showing  page  readout  at  lOOMHz  clock  rate.  Waveform  (b) 
shows  the  output  of  the  shift  register  containing  the  MSB  of  SRAM  word  bemg  read,  while 
waveform  (b)  shows  the  incoming  lOOMHz  clock  signal. 
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